计算机与现代化 ›› 2012, Vol. 198 ›› Issue (2): 31-34.doi: 10.3969/j.issn.1006-2475.2012.02.009

• 算法设计与分析 • 上一篇    下一篇

一种适用于短消息文本的聚类算法

吴勇,徐峰   

  1. 湖南机电职业技术学院信息工程系,湖南 长沙 410151
  • 收稿日期:2011-10-27 修回日期:1900-01-01 出版日期:2012-02-24 发布日期:2012-02-24

A Text Clustering Algorithm for Short Message

WU Yong, XU Feng   

  1. Department of Information Engineering, Hunan Mechanical & Electrical Polytechnic, Changsha 410151, China
  • Received:2011-10-27 Revised:1900-01-01 Online:2012-02-24 Published:2012-02-24

摘要: 针对短消息文本聚类,设计基于频繁词集和Ant-Tree的混合聚类方法。该算法利用基于频繁词集聚类算法处理文本数据的效率优势,生成初始聚簇,计算轮廓系数消除重叠文档,在此基础上再通过Ant-Tree算法继续精化,最终得到高质量的结果输出。而且聚类结果保留了描述信息和树状层级结构,提供了更广阔的应用。

关键词: 频繁词集, Ant-Tree算法, 轮廓系数, 短消息, 聚类

Abstract: As to short message text clustering, this paper designs a hybrid clustering algorithm combining by frequent termsets and Ant-Tree algorithm. This algorithm takes the advantage of efficiency of processing text data based on the frequent termsets clustering, produces the initial cluster, then eliminates the overlap text documents by calculating silhouette coefficient. Further refines the cluster by Ant-Tree. Thus gets the high quality clustering results. And the results that retain the description and tree structure can provide wider applications.

Key words: frequent term-sets, Ant-Tree algorithm, silhouette coefficient, short message, clustering

中图分类号: